TUHOI: Trento Universal Human Object Interaction Dataset

نویسندگان

Dieu-Thu Le

Jasper R. R. Uijlings

Raffaella Bernardi

چکیده

This paper describes the Trento Universal Human Object Interaction dataset, TUHOI, which is dedicated to human object interactions in images.1 Recognizing human actions is an important yet challenging task. Most available datasets in this field are limited in numbers of actions and objects. A large dataset with various actions and human object interactions is needed for training and evaluating complicated and robust human action recognition systems, especially systems that combine knowledge learned from language and vision. We introduce an image collection with more than two thousand actions which have been annotated through crowdsourcing. We review publicly available datasets, describe the annotation process of our image collection and some statistics of this dataset. Finally, experimental results on the dataset including human action recognition based on objects and an analysis of the relation between human-object positions in images and prepositions in language are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguating Visual Verbs

In this article, we introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description...

متن کامل

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

We introduce a new task, visual sense disambiguation for verbs: given an image and a verb, assign the correct sense of the verb, i.e., the one that describes the action depicted in the image. Just as textual word sense disambiguation is useful for a wide range of NLP tasks, visual sense disambiguation can be useful for multimodal tasks such as image retrieval, image description, and text illust...

متن کامل

Finding Regions of Interest from Multimodal Human-Robot Interactions

Learning new concepts, such as object models, from humanrobot interactions entails different recognition capabilities on a robotic platform. This work proposes a hierarchical approach to address the extra challenges from natural interaction scenarios by exploiting multimodal data. First, a speech-guided recognition of the type of interaction happening is presented. This first step facilitates t...

متن کامل

Identification and Characterization of Human Behavior Patterns from Mobile Phone Data

Pavlos Paraskevopoulos University of Trento, Italy, Telecom Italia SKIL [email protected] Thanh-Cong Dinh University of Trento, Italy, Telecom Italia SKIL [email protected] Zolzaya Dashdorj University of Trento, Italy, Telecom Italia SKIL, Fondazione Bruno Kessler [email protected] Themis Palpanas University of Trento, Italy, [email protected] Luciano Serafini Fondazione ...

متن کامل

MALMIR, SIKKA, FORSTER, MOVELLAN, COTTRELL: ACTIVE RECOGNITION OF GERMS1 Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning

In this paper, we introduce GERMS, a dataset designed to accelerate progress on active object recognition in the context of human robot interaction. GERMS consists of a collection of videos taken from the point of view of a humanoid robot that receives objects from humans and actively examines them. GERMS provides methods to simulate, evaluate, and compare active object recognition approaches t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

TUHOI: Trento Universal Human Object Interaction Dataset

نویسندگان

چکیده

منابع مشابه

Disambiguating Visual Verbs

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings

Finding Regions of Interest from Multimodal Human-Robot Interactions

Identification and Characterization of Human Behavior Patterns from Mobile Phone Data

MALMIR, SIKKA, FORSTER, MOVELLAN, COTTRELL: ACTIVE RECOGNITION OF GERMS1 Deep Q-learning for Active Recognition of GERMS: Baseline performance on a standardized dataset for active learning

عنوان ژورنال:

اشتراک گذاری